Goto

Collaborating Authors

 metric-guided adversarial sentence generation


Research Highlights: R&R: Metric-guided Adversarial Sentence Generation - insideBIGDATA

#artificialintelligence

Large language models are a hot topic in AI research right now. But there's a hotter, more significant problem looming: we might run out of data to train them on … as early as 2026. Kalyan Veeramachaneni and the team at MIT Data-to-AI Lab may have found the solution: in their paper on Rewrite and Rollback ("R&R: Metric-Guided Adversarial Sentence Generation") just published in the Findings of AACL-IJCNLP, an R&R framework can tweak and turn low-quality (from sources like Twitter and 4Chan) into high-quality data (texts from sources like Wikipedia and industry websites) by rewriting meaningful sentences and thereby adding to the amount of the right type of data to test and train language models on. Here is the peer-reviewed paper for your reference: https://aclanthology.org/2022.findings-aacl.41.pdf Kalyan Veeramachaneni is a principal research scientist at the MIT Schwarzman College of Computing.